You can customize the panes via Tools -> Global Options...
Panes can be detached . This is very helpful when you want another application next to the pane or behind it, or if you are using multiple monitors since then you can execute commands in one monitor and watch the output in another monitor.
You also have a spellcheck; use it to catch typos.
Open the Rmd file I sent you: Module01_forClass.Rmd and save it in the code folder. Save the data I sent you to the data folder.
Now we install some packages via Tools -> Install Packages...
devtools, ggplot2, dplyr, reshape2, lubridate, car, Hmisc, gapminder, leaflet, prettydoc, DT, data.table, htmltools, scales, ggridgesOther packages will be installed as needed
If we need to, we could update packages via Tools -> Check for Package Updates... It is a good idea to update packages on a regular frequency but every now and then something might break with an update but it is usually fixed sooner rather than later by the developer.
getwd() at the > prompt in the Console and see the resultmpa6020
--code
--dataproject via File -> New Project, choose Existing Directory, mpa6020.RprojNew File -> R Markdown ... and enter a My First Rmd File in title and your name.OKFile -> Save As.. and save it as testing_rmd in the code sub-folder and click You may see a message that says some packages need to be installed/updated. Allow these to be installed/updated.
If all goes well, you should see the following output:
As the document knits, watch for error messages
You will see the code chunks have several options that could be invoked. Here are some of the more common ones we will use.
Other options can be found in the cheatsheet available here There is an excellent R Markdown in RStudio tutorial on vimeo. If the video does not show up below (because of privacy restrictions) click on it to view it on vimeo. You may need to sign-up (for free) with an email id.
Make sure you have the data-sets sent to you via slack in the data folder. If you don’t then the commands that follow will not work. We start by reading a simple comma-separated variable format file and then a tab-delimited variable format file.
df.csv = read.csv("./data/ImportDataCSV.csv", sep = ",", header = TRUE)
df.tab = read.csv("./data/ImportDataTAB.txt", sep = "\t", header = TRUE)The sep = "," switch says the individual variables are separated by a comma, and header = TRUE switch indicates that the first row includes variable names. The tab-delimited file needs sep = "\t". If both files were read then Environment should show objects called df.csv and df.tab. If you don’t see these, check the following:
Excel files can be read via the readxl package
library(readxl)
df.xls = read_excel("./data/ImportDataXLS.xls")
df.xlsx = read_excel("./data/ImportDataXLSX.xlsx")SPSS, Stata, SAS files can be read via the haven package
library(haven)
df.stata = read_stata("./data/ImportDataStata.dta")
df.sas = read_sas("./data/ImportDataSAS.sas7bdat")
df.spss = read_sav("./data/ImportDataSPSS.sav")It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable.
df.fw = read.fwf("./data/fwfdata.txt", widths = c(4, 9, 2, 4), header = FALSE,
col.names = c("Name", "Month", "Day", "Year"))Notice we need widths = c() to indicate how many slots each variable takes and then col.names = c() to label the columns since the data file does not have variable names.
It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.)
fpe = read.table("http://data.princeton.edu/wws509/datasets/effort.dat")
test = read.table("https://stats.idre.ucla.edu/stat/data/test.txt", header = TRUE)
test.csv = read.csv("https://stats.idre.ucla.edu/stat/data/test.csv", header = TRUE)
library(foreign)
hsb2.spss = read.spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav")
df.hsb2.spss = as.data.frame(hsb2.spss)Note that hsb2.spss was read with the foreign, an alternative package to haven
foreign calls read.spsshaven calls read_spssThe foreign package will also read Stata and other formats and was the one I used a lot before defaulting to haven now. There are other packages for reading SAS, SPSS, etc. data files – sas7bdat, rio, data.table, xlsx, XLConnect, gdata, etc.
Large files may sit in compressed archives on the web and R has a neat way of allowing you to download the file, unzip it, and read it. Why is this useful? Because if these files tend to be update periodicially, this ability lets you use the same piece of R code to download/unzip/read the updated file. The tedious way would be to manually download, unzip, place in the appropriate data folder, and then read it.
temp = tempfile()
download.file("ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NVSS/bridgepop/2016/pcen_v2016_y1016.sas7bdat.zip",
temp)
oursasdata = haven::read_sas(unz(temp, "pcen_v2016_y1016.sas7bdat"))
unlink(temp)You can save your data in a format that R will recognize, giving it the RData or rdata extension
save(oursasdata, file = "./data/oursasdata.RData")
save(oursasdata, file = "./data/oursasdata.rdata")Check your data directory to confirm both files are present
Working with the hsb2 data: 200 students from the High school and Beyond study
hsb2 = read.table("https://stats.idre.ucla.edu/stat/data/hsb2.csv", header = TRUE,
sep = ",")There are no value labels for the various qualitative/categorical variables (female, race, ses, schtyp, and prog) so we next create these.
hsb2$female = factor(hsb2$female, levels = c(0, 1), labels = c("Male",
"Female"))
hsb2$race = factor(hsb2$race, levels = c(1:4), labels = c("Hispanic", "Asian",
"African American", "White"))
hsb2$ses = factor(hsb2$ses, levels = c(1:3), labels = c("Low", "Middle",
"High"))
hsb2$schtyp = factor(hsb2$schtyp, levels = c(1:2), labels = c("Public",
"Private"))
hsb2$prog = factor(hsb2$prog, levels = c(1:3), labels = c("General", "Academic",
"Vocational"))I am overwriting each variable, indicating to R that variable x will show up as numeric with values 0 and 1, and that a 0 should be treated as male and a 1 as female, and so on. There are are four values for race, 3 for ses, 2 for schtyp, and 3 for prog, so the mapping has to reflect this. Note that this is just a quick run through with creating value labels; we will cover this in greater detail in a later module.
save your work!!
Having added labels to the factors in hsb2 we can now save the data for later use.
save(hsb2, file = "./data/hsb2.RData")Let us test if this R Markdown file will to html. If all is good then we can
Close Project, and when we do so, RStudio will close the project and reopen in a vanilla session.
Almost all R packages come bundled with data-sets, too many of them to walk you through but
To load data from a package, if you know the data-set’s name, run
library(HistData)
data("Galton")
names(Galton)## [1] "parent" "child"
or you can run
data("GaltonFamilies", package = "HistData")
names(GaltonFamilies)## [1] "family" "father" "mother" "midparentHeight"
## [5] "children" "childNum" "gender" "childHeight"
You can save your data via
save(dataname, file = "filepath/filename.RData") orsave(dataname, file = "filepath/filename.rdata")data(mtcars)
save(mtcars, file = "./data/mtcars.RData")
rm(list = ls()) # To clear the Environment
load("./data/mtcars.RData")You can also save multiple data files as follows:
data(mtcars)
library(ggplot2)
data(diamonds)
save(mtcars, diamonds, file = "./data/mydata.RData")
rm(list = ls()) # To clear the Environment
load("./data/mydata.RData")If you want to save just a single object from the environment and then load it in a later session, maybe with a different name, then you should use saveRDS() and readRDS()
data(mtcars)
saveRDS(mtcars, file = "./data/mydata.RDS")
rm(list = ls()) # To clear the Environment
ourdata = readRDS("./data/mydata.RDS")If instead you did the following, the file will be read with the name when saved
data(mtcars)
save(mtcars, file = "./data/mtcars.RData")
rm(list = ls()) # To clear the Environment
ourdata = load("./data/mtcars.RData") # Note ourdata is listed as 'mtcars' If you want to save everything you have done in the work session you can via save.image()
save.image(file = "mywork_jan182018.RData")The next time you start RStudio this image will be automatically loaded. This is useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around.
If you are not in a project and they try to close RStudio after some code has been run, you will be prompted to save (or not) the workspace and you should say “no” by default unless you want to save the workspace.
There are several packages that allow us to build simple versus complicated maps in R. Of late I have been really fascinated by leaflet – an easy to learn JavaScript library that generates interactive maps – so let us see that package in action. Later on, when we move to more advanced visualizations we will look at a variety of mapping options. For the moment we keep it simple and fun.
library(leaflet)
library(leaflet.extras)
library(widgetframe)
m1 <- leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>%
addTiles() %>% setMapWidgetStyle() %>% frameWidget(height = "275")
# saveWidget(m1, 'leaflet.html')
m1Notice how this was built:
setView() to center the map with given latitude and longitude and then pick a reasonable zoom factor with zoom =. If you set the zoom factor too low you will be seeing the place from outer space and if too high then you might standing on a street corner, so experiment with it.Now, say since I ended up picking the general area around Richland Avenue, I could drop a marker on Building 21 on The Ridges. This is being done with addMarkers and the popup is basically reflecting what should be displayed when someone clicks on this marker.
m2 <- leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>%
addMarkers(lat = 39.319984, lng = -82.107084, popup = c("The Ridges, Building 21")) %>%
addTiles() %>% setMapWidgetStyle() %>% frameWidget(height = "275")
# saveWidget(m2, 'leaflet2.html')
m2The fantastic team at RStudio runs free webinar that are often very helpful so be sure to signup with your email. Here are some video recodgins of webinars that are relevant to what we have covered so far.
Open a fresh session by launching RStudio and then running File -> Open Project...
Give it a title, your name as the author, and then save it with in code with the following name: m1ex1.Rmd
Delete all content after the following code chunk
Add this level 1 heading The Starwars Data and then insert your first code chunk exactly as shown below
library(dplyr)
data(starwars)
str(starwars)Add this level 2 heading Character Heights and Weights and then your second code chunk
plot(starwars$height, plot$mass)Now knit this file to html
Go to this website and generate five Lorem Ipsum placeholder text paragraphs
Using the starwars data, create five code chunks, one after each paragraph
plot(starwars$height, plot$mass)Now knit this file to html
Create a new RMarkdown file that is blank after the initial setup code chunk
Insert a code chunk that reads in both these files found on the web
http://www.stata.com/data/jwooldridge/eacsap/mroz.dtahttp://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.savIn a follow-up code chunk, run the summary() command on each data-set
In a separate code chunk, read in this dataset after you download it and save the unzipped file in your data folder.
gender has the following codes: Zero = unknown; 1 = male; 2 = femalegender into a factor with these value labelsIn a follow-up chunk run both the following commands on this data-set
names()str()summary()In a final chunk, run the commands necessary to save each of the three data-sets as separate RData files. Make sure you save them in your data folder.
Now knit the complete Rmd file to html
I’d like you to use a specific R Markdown format because the resulting html files are very readable
You had installed the prettydoc package so now create a prettydoc Rmd file as shown below:
Now take all the text and code chunk you created in Ex. 3 and insert it in this file. Make sure you add a title, etc in the YAML and then knit the file to html
You can play with the theme: and highlight: fields, choosing from the options displayed here
You should consider using the prettydoc format unless you want to experiment with other R Markdown templates in RStudio.